DOCUMENT RESUME 



ED 266 174 



TM 860 118 



AUTHOR 
TITLE 

INSTITUTION 

SPONS AGENCY 
PUB DATE 
GRANT 
NOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



Pace, C. Robert; And Others 
The Credibility of Student Self-Reports. 
California Univ., t,os Angeles. Center for the Study 
of Evaluation. 

National Inst, of Education (ED) , Washington, DC. 
Nov 85 

NIE-G-83-0001 

64p. ; In: "Resource Papers and Technical Reports. 
Research into Practice Project" (TM 860 116). 
Reports - Research/Technical (143) 

MF01/PC03 Plus Postage. 

Academic Achievement; Attitude Measures; College 
Students; *Error of Measurement; Higher Education; 
Multivariate Analysis; Public Opinion; 
♦Questionnaires; ^Reliability; *Self Evaluation 
(Individuals); Student Characteristics; ^Surveys; 
Tables (Data); 'Validity 

College Student Experiences Questionnaire; Entering 
Student Survey; Higher Education Research Institute; 
♦Self Report Measures; Student Information Form 



This report shows that there are many ways to confirm 
the accuracy, reliability, and validity of student self-reports. 
Examples from higher education and from public opinion polls and 
general surveys demonstrate some of the common sources of measurement 
errors and errors of substance. Part 1 of the report summarizes a f^w 
highlights from the Jiterature, and adds comments from the author's 
research. Part 2 begins by briefly reporting a tabulation of "missing 
cases" in three questionnaires for college students. Two of these 
instruments are the Entering Student Survey, distributed by the 
American College Testing Program, and the Student Information Form, 
distributed by the UCLA Higher Education Research Institute; both are 
widely used, and each has the same general purpose and is intended 
for the same type of population. Following this, the College Student 
Experience Questionnaire, designed to be tilled out by undergraduates 
toward the end of the academic year, is discussed in detail. 
Test-retest comparisons of this questionnaire are used as examples of 
how subjective responses can be objectively validated. Predictive and 
construct validity of this questionnaire are examined using 
multivariate statistical procedures. iJKO) 
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INTRODUCTION 



Whenever one presents the results of a questionnaire survey, there 1s 
always someone who says "But those are only opinions". If the results come 
from a survey of students, the put-down response 1s "But those are only 
students ' opinions", as if, coming from students, the results are even less 
believable. If the comment comes from someone in the "hard" sciences, 1t 
1s likely to be "But you only have 'soft 1 data". 

It's Interesting that this sort of knee-jerk disbelief does not 
automatically occur 1n response to other surveys. The Census Bureau 
conducts many surveys that ask about people's opinions and plans. There 
are surveys to estimate consumer confidence which are taken seriously by 
economists and entrepreneurs. Political opinion surveys are carefully 
studied by candidates for office. Opinion surveys are an Important aspect 
of market research. There 1s, of course, a certain skepticism about the 
credibility of some self-reports to the Internal Revenue Service. But on 
the whole, opinion polls, survey research, and questionnaires are widely 
accepted methods of Inquiry, and certainly a very significant reature of 
scholarship 1n the social sciences. 

Opinion polls and attitude surveys, like other Inquiries, are subject 
to errors of measurement. For more than fifty years there has accumulated 
a very large body of research on possible sources of error, and on ways to 
estimate reliability and validity. The Public Opinion Quarterly regularly 
publishes scholarly articles on the methodology of polls and surveys. The 
major polling agencies are especially sensitive about the accuracy and 
validity of their reports. Some of the best known survey centers are 
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university-based — us the National Opinion Research Center at the 
University of Chicago, and the Institute for Social Research at the 
University of Michigan. 

In higher education, an1 1n education generally, questionnaires are 
quite common. There has also accumulated over a period of years a body of 
research on the credibility of students answers to questionnaires. The 
present report on the credibility of student self-reports 1s a preliminary 
document that should, and perhaps may, become a more thorough and scholarly 
document at some future date. Meanwhile, we aim to present a few 
highlights from the large literature on measuring attitudes and other 
subjective phenomena, note some of the accuracy checks that have been made 
with respect to college student questionnaire responses, and then examine 
briefly the features of two current questionnaires for entering college 
students and explore more extensively one current questionnaire for 
mdergraduates to illustrate a variety of reliability and validity 
estimates that can sometimes be produced to demonstrate the credibility of 
students answers. 
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PART 1 

ISSUES, ANSWERS, AND ADVICE 

The Russell Sage Foundation has recently published a definitive two 
volume document entitled Surveying Subjective Phenomena , (Turner and 
Martin, Editors) 1984. For anyone who wishes to review the literature of 
research on this topic, those two volumes are a fairly complete answer. In 
addition, the Russell Sage Foundation has also published a book by one of 
the most highly regarded scholars, Otis Dudley Duncan, Notes on Soda! 
Measurement: Historical and Critical . 1984, which deals with the whole 
domain of counting and classifying demographic and other elements, from 
antiquity to the present. 

In 1976 the College Entrance Examination Board published a monograph 
by Leonard Balrd, Using Self-Reports to Predict Student Performance , which 
reports much of the evidence from college student surveys about the 
accuracy of their responses to questionnaire Items, as well as their 
utility for prediction. 

Part 1 of this report 1s not a review of the literature 1n the "1 
sense. No attempt 1s made to cite chapter and verse from dozens of 
studies. Rather, everything (except as may be subsequently noted) that 
will be mentioned comes from one or more of the four major sources just 
cited. What follows, then, 1s my summary of what I regard as a few 
highlights from the literature, plus some of my own contributions to that 
literature over the past 50 yea f s. 
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Va rieties of Self-Reports 

Some self-reports merely ask for obvious, easily verifiable 
Information, such as age, sex, marital status. It 1s a subjective or 
Individual answer to an objective question. At the other end of the 
spectrum are questions and answers both of which are entirely interpreted 
by the Individual, a good example 1s the following question: "Taken all 
together, how would you say things are these days — would you say that you 
are very happy, pretty happy, or not too happy?" An example from a survey 
of college alumni 1s the following: "What 1s your present feeling about 
your college? — strong attachment to 1t, pleasantly nostalgic but no 
strong feeling, more or less neutral, generally negative, thoroughly 
negative". The meaning of the question and of the response 1s determined 
by the respondent, and can be directly known only by the respondent. 

In one part of the appendix to Volume 1 of the Russell Sage report 
there 1s a "Scheme for classifying survey questions according to their 
subjective properties" (pages 407-431). The main categories of this scheme 
Illustrate the varieties of self-reports one encounters 1n surveying 
subjective phenomena. There are three dimensions. The first 1s the 
referent of the question: objective versus subjective events. Objective 
questions refer to events that can be externally observed. Subjective 
questions refer to Internal conditions, Intuitions, beliefs, etc., which 
are directly knowable only by the Individual. The second dimension 1s the 
nature of the judgment. Such judgments might Involve beliefs, 
attributions, or valuations, and they Involve different Intellectual 
tasks. Simple judgments about the occurrence of events primarily Involve 
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recall. Attributions require generalizations and Inference. One finds 
very generalized referents such as "most people", "all 1n all", "people 
running the country today", "most faculty members", etc. The 
Interpretation of answers 1s complex and surely suggests the Importance of 
skepticism. Valuations Include questions about preferences, Hkes and 
dislikes, approval ratings, attitudes toward people, groups, organizations, 
policies, subjective sentiments such as confidence ratings, satisfactions, 
problems and worries. The third dimension 1s the object of th-» report: 
self versus other. Is the respondent being asked to report about himself? 
If so, do people tend to present themselves 1n a good light? How do these 
self-perceptions Influence one's perception of others? 

These three broad categories, albeit overlapping 1n some respects, are 
useful to keep 1n mind as one examines the content of questionnaires: the 
referent of the question, the nature of the judgment, and the object of the 
report. 

Errors of Measurement 

In questionnaire surveys of college students the chief source of 
unrepresentative results are the nature and size of the sample, and the 
proportion of people who return the questionnaire. Students 1n a large 
Introductory psychology course are often asked or required to respond to 
some questionnaire. They, of course, are not a representative sample of 
anything. For relatively small colleges, the best advice 1s to give the 
questionnaire to everyone, thus bypassing the sampling problem. In big 
universities, the task of having all entering freshmen respond to a 



0 



8 



6 



questionnaire 1s never successfully completed. If one can get two-thirds 
or three-fourths of the population one 1s doing rather well. There are 
good studies that have obtained data from a broad assortment of students 
and Institutions; but nothing comparable to a national public opinion poll 
1n Its representativeness. The more significant problem, however, 1s 1n 
the response rate. Whether questionnaires are distributed via the U.S. 
Postal Sevlce, or whether they are put 1n a campus mailbox, many are never 
returned. 

In a national questionnaire survey of students and alumni which I 
carried out 1n 1969, Involving random samples at about 75 colleges and 
universities, the median response rate to the freshman questionnaire was 
80%. for the upperclassmen questionnaires the median response rate was 66X, 
and for the aluunl samples the median response rate was 581. The 
questionnaires, each about 16 to 20 pages 1n length, were attractively 
designed and printed; most colleges used one followup reminder; and for the 
alumni samples there were two followup reminders. 

Even 1f one had returns from everyone the basic conclusions would not 
change significantly; but probably 1n all questionnaire surveys there 1s 
some selectivity or bias among those who respond. In the 1969 study the 
poorest rates of return from freshmen and upperclassmen came from the large 
Institutions; but 1n the alumni questionnaire the differences 1n return 
rates were not related to size, they were related to Institutional 
selectivity and prestige. In the elite categories, only 2 1n 20 (10«) had 
an alumni response rate of less than 50»; 1n the middle category 
scholastlcally, there were 10 of 39 (26X) with a response rate of less than 
50J; and In the least selective category, there were 5 of 15 (33J) with 
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fewer than 50X returns from their alumni. 

In two recent questionnaire surveys of UCLA undergraduates, the 
response rates have been between 45X and 50X There are. of course, ways to 
Increase the rate of return of mailed questionnaires. Unfortunately, for 
academic researchers, they are very costly and the money 1s not 
forthcoming. 

Unlike the usual procedure m academic surveys, the national opinion 
polling agencies collect their data by Interviews. The carefully designed 
stratified area sampling techniques do, 1n fact, produce reasonably 
reliable and valid results. The magnitude of non-response 1s minimal 
because the Interviewers' s job 1s to get everyone who fits the sample 
specifications. 

On several past occasions I have suggested that periodic polls of 
college students might be very worthwhile. But they would require 
developing an adequate base for sampling, and this does not now exist. The 
carefully designed sampling procedures, and the resulting national samples 
for public opinion polls, are not applicable to the college population. 

There are several other aspects to the present topic of measurement 
error. These relate to the estimation of reliability. Does one get 
similar answers to the same questions from comparable samples? In a 
test-retest situation, do people give the same answer the second time that 
they gave the first time? Do slightly different questions about the same 
topic result 1n generally similar responses? Most surveys 1n social 
science and 1n higher education do not report answers to any of these 
questions, and presumably do not collect evidence about any of these 
matters. But they should. And at least periodically they have. 
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In 1948 a 16-page questionnaire was mailed to a sample of Syracuse 
University alumni. The questionnaire Included two types of Items whlcn 
were subsequently readmlnlstered to a small sample. The questionnaire 
contained eleven Activity Scales of eleven Items each, labeled Politics, 
C1v1c Affairs, Religion, Art, Music, Literature, and Science. The subjects 
checked each activity they had engaged 1n during the past year. The scales 
were Guttman-type scales 1n that participation 1n the more dlflcult 
activities tended to subsume participation 1n the easier and more common 
activities. The score on each scale was simply the number of activities 
checked. Then there were nine Opinion Scales of six Items each, labeled 
Politics, Civic Relations, Government, the World, Philosophy, Art, Music, 
Literature, and Science. 'The statements 1n the opinion scales were written 
to reflect basic concepts or generalizations about the topics, 
generalizations reflecting a consensus of experts 1n the field, so that 1t 
was possible to score each scale simply by counting the number of > 
statements on which one's opinion agreed with the opinions of the experts. 
Each statement was answered on a five point scale, from Strongly Agree to 
Strongly Disagree. Six months after the Initial sample of 2500 had filled 
out the questionnaire, a second copy was sent to a small group of 120, 
receiving 68 1n return. The test-retest consistency of scores over this 
six-month Interval was computed. For the Activity Scales, the correlations 
ranged from .70 to .89, with a median of .83. For the nine Opinion scales 
the median test-retest correlation was .65, with seven falling between .60 
and .70, and two much lower ones of .40 and .31. Consistency of responses 
was also checked Item by Hem. For the Activity Items, the averge percent 
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of Identical responses was 85, with a range from 83 to 87. For the Opinion 
Items the average percent of Identical responses was 75, with a range from 
68 to 84. The above test-retest data were reported in an article by Pace, 
"Opinion and Action: A Study 1n Validity of Attitude Measurement", 
Educational and Psychol ogical Measurement . Vol, 10, No. 3, 1950, pages 
411-429. 

The ACT Evaluation/Survey Sevlce, Users Guide, 1981, reported 
test-retest results on ACT's Student Opinion Survey for a group of students 
at one university who responded to the questionnaire a second time 
approximately two weeks after the Initial response. The average percent of 
Identical responses on the two administrations was 981 for demographic 
background items (age, race, sex, etc.), 901 ft; athc-r background Items 
such as hours worked per week, occupational plans, etc., and 931 for Items 
about the usage of college programs and services. For "Satisfaction" Items 
(responses on a f1ve-po1nt scale from Very Satisfied to .ery D1ssat1sf lad) 
referring to such matters as academic aspects of the college environment, 
rules and regulations, facilities, college services, etc., the percent of 
Identical item responses was typically about 641, and the percent of 
responses within one scale point of the Identical response typically about 
952. 

In the American Council on Education Research Report, Vol. 7, No. 2, 
1972 by Boruch and Creager, entitled Measurement Error 1n Social and 
Educational Survey Research, two examples of test-retest comparisons are 
fited. One example administered a questionnaire twice, with six weeks 
Intervening, to a group of 107 college students. Questions about students 
previous achievements resulted 1n 90% to 100% agreement. Answers to other 



0 

ERIC 



12 



10 



facts — such as father's education ar.^ occupation, high school grades, 
etc., had agreement percentages from 741 to 92X. Att1tud1nal items, and 
questions about future plans typically Involved agreement 1n the 60-70* 
range. The other example was the readmlnl strati on of the ACE freshman 
survey questionnaire to 202 students following an interval of two to three 
weeks. Test-retest correlations for different types of Items were as 
follows: demographic characteristics, mostly .96 to .99; sources of 
financial support, mostly .85 to .88; self-reported att-ioutes of parents, 
mostly .60 to .82; Items estimating the chances of future events (such as 
graduating with honors, joining a fraternity or sorority, falling one or 
more courses, changing career choice, etc.), mostly .58 to .88 with a 
median of .78; Items about life goals such as the importance of being very 
well-off financially, raising a family, keeping up with political affairs, 
helping others 1n difficulty, mostly from .65 to .87 with a median of .73; 
attitudes toward the Importance of various federal actions such as 
pollution control, school desegregation, veterans benefits, consumer 
protection, correlations ranging from .41 to .83 with a median of .63; and 
Items about attitudes toward various campus and social Issues such as 
faculty promotions should be based on student evaluations, marijuana should 
be legalized, with test-retest correlations ranging from .57 to .88 with a 
median of .66. 

Both the ACT and ACE reports show that the greatest variability 1n 
responses are found 1n relation to questions that *re ambiguous, or about 
topics which students may not have given much prior thought or concern , or 
about attitudes which are themselves subject to various *■ ^rpretatlons. 
In some caseo, the test-retest correlations are low enough to raise doubts 
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about the value of the responses, especially when the test-retest Interval 
1s only 2 to 6 weeks. For the more specific Items, consistency of 
responses was quite high. 

In public opinion surveys there have been some examples of comparing 
the results to the same questions when asked by different survey 
organizations. The closest or most carefully controlled conditions are 
called tandem surveys. In one such tandem survey, NORC and Roper each drew 
probability samples and proceded to administer the survey 1n their 
customary fashion. This was a survey about public use of and attitudes 
towards television. Differences 1n the results were small; but there was a 
clear effect related to how the organization determined the "don't knew" 
responses. On 52 comparisons, NORC had fewer DKs on 42 Items, Ropsr fewer 
on 4 Items, with no differences on the other Items. In another study, a 
survey about public attitudes and knowledge concerning survey practices, 
the sample was drawn by the Survey Research Center, and the cases randomly 
assigned to SRC and Census Bureau Intsrvl ewers. In general, the results 
were fairly similar. However the Interviewee refusal rate was 61 to the 
Census Bureau 1nterv1e- and 13X to the SRC interviewers. 

A summary tabu* reported 1n Volume 1 of the Russell Sage 
publication, of 126 Instances 1n which the same questions were asked by 
different survey f» nsurement programs at about the same time shows that 1n 
45 of the Instances there were differences beyond the level typically 
allowed for sampling error. Such differences could have come from many 
sources — context, Interviewer effects, training and staff differences, 
etc. Some of the differences were clearly attributable to how DKs were 
handled. Variations 1n practices produce differences 1n the products; but 
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most of these differences are relatively small. When the conditions are 
most comparable, as 1n tandem surveys, the results are highly congruent. 

Errors of Substance 

Whether people report accurately about their conditions or their 
behavior 1s, 1n one sense, an error of measurement and 1n another sense an 
error of substance. In surveys of college students there 1s a good deal of 
evidence that self-reports about their school grades, and about prior 
accomplishments are very accurate. Much of this literature has been 
summarized by Leonard Baird in the monograph he wrote for the College Board 
in 1976. Are student's self-reports of their grades accurate? Baird 
himself found that the correlation between college-reported and 
student-reported nrades was genrally about .87. In a study of 
self-reported and transcript-reported grades, by Nichols and Holland in 
1963 among National Merit Scholars, cited by Baird, the correlation was 
.96. Maxey and Ormsby in 1971 reported correlations between self-reported 
and school -reported grades in a sample of nearly 6000 students in 134 
schools to be on the average in the mid eighties. They found that 982 of 
the students 1 reported grades were accurate within one grade. Baird 
concludes from many studies that "research accumulated over 30 years, using 
various methods, in samples of grade school students, high school students, 
college applicants, junior college students, four-year college students, 
and professional school students, adds up to one conclusion: students 1 
reports of their grades are about as useable as school-reported grades", 
(page 8). Moreover, self-reported grades predict future grades as well as 
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or better than college entrance tests of academic ability. It seems fair 
to conclude that, at least for some kinds of questions, errors of substance 
1n the answers are minimal. 

The data from the above studies are a good example of what one can 
expect when the questions are clear and specific, and when the response 
options are equally clear and specific. Students know the definition of 
grades and they know their own grades. Consequently, one can have 
confidence that the subjects can answer the questions. But 1n many surveys 
no such clarity 1s evident. 

Evidence from the larger survey research literature also confirms the 
accuracy of self-reports about various specific conditions or behavior. 
For example, correlations between employers records about wages, duties, 
etc., and application blank work histories were generally .90 or greater. 
Adults reports of whether they owned their home were 96X accurate, had a 
valid library card 87X accurate. One necas to be reminded here, that 
"official records" are not always 100X accurate. 

Perhaps one of the most serious errors of substance arises from 
variations 1n the content, or wording, of the questions, and from the 
context 1n which the questions are used. There are some classic examples 
of this. The following question was asked 1n a national sample poll; "Do 
you think the Unlteo States should let Communist newspaper reporters from 
ether countries come 1n here and send back to their papers the news as they 
see 1t? H Half the questionnaires asked this question after another 
question on whether the Soviet Union should allow 1n American newspaper 
reporters; and the other half of the questionnaires asked the questions in 
the reverse order. When the question about communist reporters war asked 
first, 55X of the people agreed, but when the question about American 
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reporters was asked first, 751 agreed. Or, consider the following two 
questions: 1) Do you approve or disapprove of a married woman earning 
money 1n business or Industry 1f she has a husband capable of supporting 
her? (651 of a national sample approved); 2) If there 1s a limited number 
of jobs, do you approve or disapprove of a married woman earning money 1n 
business or Industry when her husband 1s able to support her? (Only 36Z 
approved!) Here 1s another example of different answers from slight 
differences 1n wording, "Do you think the United States should forbid 
public speeches against democrary?" (Yes, 541.) Do you think the United 
States should allow public speeches against democracy?" (No, 751). 

Another type of error, potentially causing substantive or interpretive 
dlffl cities, 1s the use of response options that each person Interprets In 
his own way. Examples of such response options are the use of words or 
phrases such as frequently, occasionally, rarely, most of the time, very 
much, quite a bit, usually, seldom, a great deal, very little, etc.. 
Presumably words such as always and never* mean the same to everyone. But 
how often 1s "often"? And how much 1s "very much"? 

Pace and Frledlander, "The meaning of response categories: how often 
1s occasionally, often , and very often?", Research In Higher Education . 
Vol. 17, Nc. 3, 1983, addressed this Issue using data from the College 
Student Experiences questionnaire. Participation 1n various college 
activities were Initially Indicated by the responses "never", 
"occasionally", "often" or "very often". Later 1n the questionnaire seven 
of the same activities were responded to as follows: For each of the Items 
belnw, fill 1n one of the spaces to the left which best Indicates the 
number of times you have engaged 1n the activity. These more specific 
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responses were: '' never", "once or twice during the year", "about three to 
six times ng the year", "about once or twice a month", "about once a 
week" and "more than once a week". By this means we were able to show what 
students meant (number of times) by the more general words. The results, 
as one would expect, revealed considerable overlap by what was meant by 
occasionally, often, and very often. But there was also a clear 
concentration or clustering of responses as one moved from occasional to 
often, and from often to very often. The meaning or definition of these 
general descriptors was different, depending upon the topic; but within the 
same topic the differences between colleges or types of students were quite 
snail. In general, the definition of "occasionally" at one college was 
similar to its definition at other colleges, given the same topic. 

Every respondent knows perfectly well that "very often" is more than 
"often", and that "often" is more than "occasionally". Thus, the 
direction of the scale is recognized by everyone. But the specific meaning 
attached to the labels is an individual judgment. There were few obviously 
implausible responses ~ such as students who initially said "occasionally" 
or "often" but later said "n*ver"; or students who initially said 
"occasionally" but later said "more than once a week". These discrepancies 
constituted from 2% to 10% of the total responses. 

Comparative judgments of this sort necessarily reflect some reference 
group in the mind of the judges. On this questionnaire, we assume that the 
college peer group is the reference group, and that the answers reflect an 
awareness of what is customary in one's own behavior and in the behavior of 
the peer group. 
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The point of these observations about the subjective definition of 
response choices 1s that one should get, 1f at all possible, some sort of 
evidence about what people mean by t5ie1r choices. This same advice applies 
to opinion polls which ask about degrees of happiness, satisfactions, 
confidence, or other subjectively defined responses. 
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PART 2 

THREE COLLEGt QUESTIONNAIRES 

Efforts to evaluate the Influences of college on students 1 learning 
and" development should draw upon many sources of evidence. For .much of 
this relevant evidence the students themselves are the source; and the most 
common method for obtaining that evidence 1s a questionnaire. 

Here, for example, ire four crucial questions. 

1. Who goes? What do we know about the entering students: their 
high school record and test scores, their family background, financial 
status, their Interests, expectations, aspirations, past achievements, 
etc.? Some of this Information can be obtained from records, but some can 
be obtained only by asking the students themselves. 

2. What do they do after they get there? Some answers can be 
obtained from college records — such as, campus residence and major field, 
but for other sorts of behavior — such as the time and effort devoted to 
study, contacts with faculty, Involvement 1n extra-curricular activities, 
use of the library, etc. — the answers can only come from students 1 
responses to questionnaires. 

3. What's 1t like? Physical facts — such as big school, small 
school, and big city, small town — are Important. So also are students' 
perceptions of the campus environment or atmosphere. What 1s stressed? 
What 1s expected? How do people relate to one another — friendly, 
supportive, or not? Answers to these questions can only come from the 
students themselves. 

4. What do they get out of 1t? Knowledge, basic skills, and 
abilities relevant to a career, relevant to personal maturity and life 
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satisfaction, relevant to dvlc enlightenment ~ these are some of the 
possible and Intended results. Achlevejnent tests* aWlty tests, 
personality tests, etc. can nrovlde some of the answers. It may also be 
Inportant to find out what the students themselves think they got out of 
college; and here again one relies on responses to questionnaires. 

Questionnaires can, and I think should, be regarded as a form of test 
or measuring Instrument. Many questionnaires are 1n fact regarded as tests 
by those who construct them. So, t have tests of attitudes and beliefs, 
vocational Interests, pesonallty traits, etc.. A variable or dimension to 
be measured 1s defined, sets of Items are de % sloped to measure it, and the 
reliability and validity of the results are determined. The process 1s 
similar to the construction of an objective achievement test, or a test of 
developed abilities such as the Scholastic Aptitude Test. Attitudes, 
Interests, beliefs, etc., are subjective phenomena. The answers one gives 
to a question about Interests or opinions 1s determined by the Individual. 
The student decides whether he agrees or disagrees with some statement, or 
Hkes or dislikes some activity, or person, or condition. The good 
published tests of personality, Interests, or values provide extensive data 
regarding their reliability and validity ~ tests such as the Minnesota 
Multiphasic, the Omnibus Personality Inventory, Holland's Vocational 
Preferences Inventory, the Allport-Vernon-Llndzey Study of Values. In some 
tests of this sort, the authors have Included a few Items to detect whether 
a student 1s giving false or Improbable answers ~ a practice which 
recognizes the Importance of estimating the credibility of self-reports. 

Many of the questionnaires used 1n studies of higher education are not 
designed as tests 1n the classical sense. They consist of sets of Items, 
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often grouped or classified under certain topics* but having no underlying 
or scorable dimension. One finds for example, various items about students 
use of counseling services, or various Items about students opinions of 
teaching practices, or various Items about students attitudes regarding 
political and economic ... The 1ter«s are no doubt regarded as 
Interesting and the answers useful to know. But the content 1s best 
described as a classified catalogue rather than as a theoretically or 
conceptually based set of dimensions or characteristics. The value of the 
question and the credibility of the answer has to be examined Item by 
Item. There 1s nothing Inherently unreliable or Invalid about a one-Item 
test. Most public opinion polls are really one-Item tests. But 1t 1s 
Important to realize lat variations 1n responsei arc often caused by 
variations 1n the phrasing of the question. Slight changes 1n wording can 
produce significant changes 1n responses. Consequently, the meaning of the 
answers rests on a slender base. 

To begin Part 2 we briefly report a tabulation of "missing cases" 1n 
three questionnaires for college students. The results lllustrata some of 
the principles and advice given 1n Part 1, and serve to confirm, with these 
three current cases, the merit of that advice. Then, the main content of 
Part 2 1s a detailed examination of one questionnaire to Illustrate some of 
the Internal and external checks that can be made to assess the reliability 
and validity of students responses. The content of this one Instrument — 
Pace's College Student Experiences Questionnaire — makes meaningful cross 
checks possible, for 1t bears upon all four of the topics noted 1n the 
Introduction to Part 2: Who goes? What do they do after they get there? 
What's 1t like? and What do they get out of 1t? 
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Missing Cases: What types of questions are not answered? 

To provide some current Illustrations of non-response to questionnaire 
Items we have examined two widely used instruments, each having the same 
general purpose and each Intended for the same type of population. The 
first 1s the Entering Student Survey, distributed by the American College 
Testing Program. The second 1s the Student Information Form, distributed 
by the UCLA Higher Education Research Institute. 

Both of these questionnaires are introduced with assurances regarding 
the confidentiality of the students 1 responses. The HERI questionnaire 
says "Identifying Information has been requested 1n order to make 
subsequent mall followup studies possible. Your response will be held 1n 
the strictest professional confidence". The ACT questionnaire says, "The 
Information you supply on this questionnaire will be kept completely 
confidential. Your name, address, and Social Security number will enable 
college officials to Identify your responses and to contact you directly. 
The data you supply will be used for research purposes and will not be 
Individually listed on any report. If, however, any question requests 
Information you do not wish to provide, feel free to omit 1t." 

Both questionnaires have many similar and 1n some cases Identical 
Items, for example: age, race, sex, marital status, planned college 
residence, high school grades, planned college major, planned occupational 
choice, sources of funding, reasons for going to college. Straightforward 
Identification questions* and questions about specific activities, reasons 
for going to college, etc. are typically omitted by fewer than 41 of the 
cases, and often by fewer than 2%. The questions which are omitted by the 
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largest percentages of respondents are ones related to money, religion, 
expected major field and occupation, and assorted Items about personal and 
social values. 

On the HER I questionnaire there are typically about 12X to 13X who do 
not answer the Items about parents Income, and sources of funding for 
college. Many of those Items identify specif 1c dollar amounts ~ parents 
total Income — or a specific fact ~ listed as a dependent on federal 
Income tax return. No doubt 1n some Instances the students do not know the 
answers; and perhaps In other Instances they regard the question as 
Inappropriate. The ACT does not ask about dollars; 1t asks whether various 
sources of funding are a major source, minor source, or not a source. 
Eleven sources are listed, and about 5 1/2% to 9X of the students do not 
respond. 

The HERI questionnaire asks the students to Indicate the religious 
preference of self, father, and mother. From 15X to 17X do net answer the 
question. 

On the HERI questionnaire 6X of the entering freshmen do not indicate 
their probable undergraduate major, and nearly 7X do not indicate their 
probable career occupation. On tne ACT questionnaire the percent of omits 
1s 12X for the probable major and 16X for the probable occupation. The 
reason for these larger numbers may be owing to the format. The ACT survey 
has a separate sheet inserted with the questionnaire listing many major 
fields and occupations. The student finds the 3-d1git code that best 
describes his plans and then fills 1n these numbers on the questionnaire. 
Apparently some students just don't bother to do this. On the HERI 
questionnaire the various fields and occupations are listed on the 
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queslonnalre Itself, making the response easier to record. In both cases, 
however, 1t seems reasonable to suspect that asking entering freshmen about 
their probable college major and their probable occupation 1s not viewed as 
an answerable question by some students. In fact, on a different part of 
the HERI questionnaire more than 20% of the students said the chances were 
very good that they would change their major and change their occupational 
choice. 

In both questionnaires, Items about such topics as reasons for going 
to college and reasons for going to thlr particular college, were omitted 
by only 21 to 4Z of the respondents 1n most Instances. The ACT 
questionnaire has a section labeled "college Impressions" where students 
are asked to Indicate their agreement with various statements about the 
college environment — such as "students at this college are friendly", 
"this college offers many cultural events and programs". Typically about 
31 omit these Items; although one wonders about the basis for the answers 
because often one's Impressions, 1n advance of actual experience, reflect 
common stereotypes about what college Is like. 

The HERI questionnaire asks students questions about various 
political, social, and educational policies — such as "abortion should be 
legalized", "college grades should be abolished", "the federal government 
1s not doing enough to control environmental pollution". Typically about 
5% to 8% of the students do not answer these questions. Another question 
asks students to characterize their political views, as far left, liberal, 
middle-of-the-road, conservative, or far right. About 51 do not answer the 
question. 
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For all of the above data, the Information about the ACT questionnaire 
comes from a normative report based on about 16,000 cases 1n which the 
number of "blank" responses to every 1*™ is listed. For the HERI 
questionnaire the data come from the 1983 report of freshmen norms 1n which 
the data for one sample college are shown, having about 2,300 cases. The 
complete normative report does not show missing cases. 

Except for the questions about major field and probable occupation, 
the number of "omits" in the ACT questionnaire is generally smaller than 1n 
the HERI questionnaire. There may be several factors accounting for 
this. The ACT questionnaire is shorter. The format and organization are 
also clearer. Section 1 is labeled Background Information, Section 2 1s 
Educational Plans and Preferences, Section 3 is College Impressions. 
Although in some parts the print is quite small, each part is enclosed 1n a 
box, with the question or topic Itself in boldface capital letters. 
Perhaps more important 1s the likelihood that most students would not view 
any of the questions as offensive or Intrusive. There 1s no invasion of 
privacy of the sort that might influence one to omit the answer or to give 
a socially desirable answer rather than a more forthright answer. One can 
easily regard the questions as appropriate to ask of entering students 
because of the educational relevance of the questions. 

The HERI questionnaire, although of the same four-page length as the 
ACT, has many more Items, and the format consequently appears crowded. 
Also there is no obvious organization or sequence to its questions. The 
reasons for not answering various questions, however, are probably owing 
more to the nature of the questions than to the format. Questions about 
the future ~ such as "what is your best guess that you will": graduate 
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with honors, change career choice, transfer to another college, find a job 
after college 1n the field for which you were trailed, etc.? — are 
generally skipped by 1% to 6Z of the respondents. Questions about 
longrange aspirations or values are sklpoed by 5% to 8Z of the students. 
* >o, as noted earlier, questions about political and personal attitudes 
are typically skipped by 51 to bZ of the students. From one perspective, 
these are not large percentages; and the conclusions one draws fron< those 
who do respond would not be changed 1n any significant way 1f everyone had 
responded. From another perspective, these percentages of missing cases 
may represent the tip of a deeper and larger problem about the validity of 
students responses. There 1s no doubt that some students do not like some 
of the questions. During t:,w- time of student activism 1n the late 1960s, 
there were organized student protests against answering the sort of 
questions that are still deluded 1n todays edition. At the end of the 
questionnaire, 26Z of the students do not give permission tc Include their 
ID number on any tape for future research or follow-up study. This 
undermines the validity of the data base for longitudinal studies. 
Moreover, when one realizes that the response rate to a mailed follow-up 
questionnaire may be only 50Z or even less, then, together with the 26Z 
refusal to be Involved, one 1s left with a respondent population that may 
be only 1/3 or 1/4 of the population one should have. 

Missing cases have also been noted for a third Instrument — Pace's 
College Student Experience questionnaire. Later 1n this report a detailed 
examination of the reliability and val ilty of responses will be 
presented. At this point, only the data about missing cases are reported. 
Most of the questionnaire consists of 142 college activities to which the 
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students respond by Indicating whether and how cften they have engaged 1n 
them during the current school year. These are, for the most part, quite 
specific events, and apparently quite easy to recall. Based on the 
responses of about 7,500 undergraduates, the number of missing responses 
was rarely more than IX, and never more than 2%. These activities are 
grouped Into scales, usually of 10 Items, to which an activity score can be 
computed. If any Item 1n a scale 1s not answered no score 1s computed. 
The number of missing cases 1n these scale scores 1s, 1n most scales, about 
IX or less, and never more than 4%. !n other parts of the questionnaire 
students are asked to indicate how much progress they believe they have 
made with respect to various goa'is or objectives, how well satisfied they 
are with college, and how they would characterize the college environment 
along various dimensions. The missing cases to these Items are often fewer 
than II and never more than 2%. In another brief section of the 
questionnaire students are asked to indicate about how many textbooks they 
read, how many non-assigned books, how many essay exams they had, and how 
many other written reports they made during the current school year. The 
percent of missing responses was typically from less than IX to 2J, except 
among students 1n not highly selective liberal arts colleges where there 
were 31 to 41 missing cases. No obvious explanation comes to mind for 
these somewhat larger percentages. With respect to the usual background 
Items ~ age, sex, year 1n school, etc. ~ there are typically no more than 
IX or 2% missing cases, except for the questions about the student's major 
field where the percent of mlss.ig cases ranges from 31 to 61 at different 
types of Institutions. Unlike the ACT and HERI questionnaires which are 
given to beginning freshmen, the CSEQ 1s answered by undergraduates In 
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general, not just by freshmen, so that most of them do 1n fact have a 
definite major field. Why there should be from 3X to 6X omits 1s a 
mystery, of course, not all possible major fields can be listed 1n the 
questionnaire so that students may wonder where to classify their own 
major. Also, especially 1n the more heterogeneous colleges, and also 1n 
the most selective ones, there may be more Interdepartmental majors or 
other special options. Apparently, Instead of checking "other" as the 
proper response, they just omit the Item. 

The College Student Experiences Questionnaire; A Brief Description . 

To understand some of the analyses to be reported next, some knowledge 
about the content of this questionnaire may be helpful. The questionnaire 
1s meant to be filled out by undergraduates toward the end of the academic 
year. It 1s an eight page, 8 1/2 by 11 format, with the cover page 
Indicates what Its all about, and stating that "we do not ask you to write 
your name anywhere 1n this questionnaire; but we do need to know where the 
reports come from, and that 1s why each questionnaire has a number on the 
back page — certain blocks of numbers tell us that those questionnaires 
come from your college". The first 1 1/2 pages consist of "Background 
Information" — the usual que«t1ons about age, sex, year 1n school, college 
residence, major field, parents education; and also time spent on academic 
work, time on a job, main source of funding for college, grades, race, and 
citizenship. The next 3 1/2 pages are labeled "College Activities". There 
are 142 activities, grouped Into "scales" or topics labeled library 
experiences, experiences with faculty, course learning, art-mus 1c- theater, 
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student union, athletic and recreation facilities, clubs and organizations, 
experiences in writing, personal experiences, student acquaintances, 
science/technology* dormatory or fraternity/ sorority, topics of 
conversation, anu Information 1n conversations. The directions are: "In 
your experience at this college during the current school year , about how 
often have you done each c" the following?" The responses are "never", 
"occasionally", "often", and "very often". The activities are fairly 
specific sc that the student would presumably recall accurately whether he 
had ever done thro; but of course the frequency estimate 1s entirely a 
subjective response. Examples of activities are as follows: read 
something 1n the reserve book room or reference section, made an 
appointment to meet with a faculty member 1n his/her office, summarized 
major points and Information 1n readings or notes, gone to an art gallery 
or art exhibit on the campus, meet your friends at the student union or 
student center, played on an Intramural team, worked on a committee, asked 
other people to read something you wrote to see 1f 1t was clear to them, 
sought out a friend to help you with a personal problem, made friends with 
students from another country, practiced to Improve your skill 1n using 
some laboratory equipment, gone out with other students for late night 
snacks, talked about current events 1n the news, referred to something a 
professor said about the (conversation) topic. 

The next brief part of the questionnaire asks students to report how 
much reading and writing they have done, and how well satisfied they are 
with college. 

The next main topic 1s labeled "The College Environment". This 
consists of eight rating scales on which students report their Impressions 
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of the empjiasls or stress there 1s 1n the environment on such aspects of 
students 1 development as academic and scholarly mantles, esthetic and 
creative qualities, being critical and analytical , vocational and 
occupational competence, and the general relevance and practical values of 
the courses; also their Impressions of the personal relationships 1n the 
environment, ranging from supportive, helpful, considerate to alienated, 
unsympathetic, and rigid with respect to the relationships among students, 
between students and faculty, and with administrative personnel* Finally, 
the last section, labeled "Estimate of Gains", lists 21 goals or objectives 
of college education and asks students as follows: "In thinking over 
your experiences 1n college up to now, to what extent do you feel you have 
gained or made progress 1n each of the following respects?" The responses 
are "very little", "some", "quite a bit", and "very much". Here again, the 
responses are entirely subjective. 

From one perspective, this College Student Experiences questionnaire 
Includes features that some think should be avoided, 1f possible. The 
ratings are entirely subjective, the estimates of the amount of gain are 
entirely subjective, and the reports of frequency of participation 1n 
activities are entirely subjective. What follows next are examples of how 
subjective responses can be objectively validated. 

Test-Re test Comparisons 

In the absence of any major changes 1n the campus environment or 
facilities or admissions policy, one would expect some consistency 1n the 
amount, scope, and quality of effort revealed by students 1 responses to the 
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activities scales by different but comparable samples. Recently, several 
colleges have used the College Student Experiences questionnaire twice — 
once 1n 1984, and again 1n 1985. Such comparisons are not, strictly 
speaking, an Indication of the reliability of self-reports. The answers 
come from different students and from a different time. Some changes 1n 
the responses may reflect true changes, not random changes or errors of 
measurement. Nevertheless, 1f one found substantial variations 1n the 
responses of two different but similarly selected samples, even though a 
year apart, one would worry about the dependability of the results. 

The best test-retest example comes from Denver University. It 1s best 
1n the sense that the sample size was fairly large — 635 1n the spring of 
1984 and 661 1n the spring of 1985. The samples were selected In the same 
way, the response rate was similar, and the two goups did not differ 1n 
such population descriptors as age, sex, year 1n school, major field, 
grades, residence, transfers, etc.. No attempt 1s made to compare the 
responses to every Item 1n the questionnaire. Rather, to get a general 
Indication of stability, comparisons are made between the mean scores 
on each of the 14 activity scales, and the mean ratings on each of the 
environmental characteristics. It 1s not appropriate to report the scores 
on these matters, but 1t 1s permissible to report the differences between 
the 1984 scores and the 1985 scores. The second test-retest example comes 
from Case Western Reserve University — with a sample of 779 students 1n 
the spring of 1984 and of 376 1n the spring of 1985. The characteristics 
of the two samples are nearly Identical with respect to age, sex, year 1n 
school, transfer status, major field, etc.. The third example comes from 
Keuka, a small college for women in upstate New York ~ with 148 1n the 
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1984 sample and 130 1n the 1985 sample. The groups were similar 1n all 
respects except one: the 1985 sample had a larger proportion of freshmen. 

On the 10-1 tern activity scales the possible range of scores 1s 30 
points; 36 points on the throe 12-1 tern scales; and 20 points on the one 
6-1tem scale. The typical standard deviations are 5.7 on the l0-1tem 
scales, 6.0 on the l2-1tem scales, and 3.2 on the 6-1tem scale. A glance 
at the 11st of differences 1n the table quickly reveals that at all three 
schools the magnitude of differences 1s usually less than cne point. This 
1s true of 13 out of 14 scales at Denver, all 14 at Ca , Western Reserve, 
and 10 of the 14 scales at Keuka. In fact, at Denver the difference 1n 
mean scores between the 1984 and 1985 samples 1s .5 or less on 10 of the 
scales, at Case Western Reserve the differences are .5 or less on 13 of the 
14 scales; and at Keuka on 6 of the 14 scales. 

If comparable scores from comparable samples, even though a year 
apart, 1s an Indication of test reliability, then obviously these student 
self-reports are very stable and dependable. Ax Denver, where there 1s a 
significant difference of 2.4 points on the Student Union scale, the 
explanation 1s a good example of a change 1n results owing to a change 1n 
conditions. During 1984 at Denver a new student union and activity center 
was under construction; 1985 was the first full year of Its operation, and, 
not suprlslngly, the activity score for students 1 use of the union 
Increased significantly. At Keuka the differences between mean scores, 
although greater than 1.00 on four of the scales, are not statistically 
significant. N 

From these comparisons, one can surely conclude that self-reported 
activities and self-reported ratings of environmental characteristics are 
dependable and consistent. 
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Test-Retest Mean Differences — 1984 and 1985 
In Activity Scale Scores and Environment Ratings 



Activity Scales 


Denver 
Univ. 


Case Western 
Reserve 


Keuka 
College 


Library Experiences 


.3 


.4 


.7 


Experiences with Faculty 


.3 


.1 


1.3 


Course Learning 


.6 


.2 


0 


Art, Music, Theater 


.5 


.4 


1 


Student Union 


2.4 


.2 


4 


Clubs and Organizations 


.5 


.9 


.5 
• 


Athletic and Recreation 


.1 


.4 


0 


Experience 1n Writing 


.5 


.2 


.2 
• * 


Personal Experiences 


0 


.3 


.7 


Student Acquaintance* 


.6 


.1 


1.4 


Science/Technology 


0 


.5 


1.2 


Dormatory or Fraternity/Sorority 


.9 


0 


1.5 


Topics of Conversation 


.5 


.4 


.6 


Information 1n Conversations 


.1 


.2 


.8 


Environment ratings 








Academic 


.1 


0 


.1 

• * 


Esthetic 


.1 


.2 


.2 


Cr 1 t1 cal/analytl cal 


.1 


.1 


.1 


Vocatf onal 


.2 


0 


.2 


Personal Relevance 


.2 


.2 


.1 


Students 


.3 


0 


.1 


Faculty 


1.3 


0 


.2 


Administration 


.1 


0 


.4 



ERJC 34 



32 



External Validity: Self -re ported gains vs objectively known achievement 

Over the past 50 years hundreds of thousands of college students have 
taken objective achievement tests 1n various college subjects, tests 
constructed by national testing agencies. Certain conclusions from all 
this testing are so consistent, and so obvious, that 1t almost seems 
unnecessary to state them; but 1f one 1s to document that self-reported 
achievement corresponds to objectively tested achievement, then some 
examples of the test evidence must be given. The examples that follow are 
reported 1n Pace, Measuring Outcomes of College . Jos sey- Bass, 1979. 

The first example shows the relationship between credit hours and test 
scores from the Pennsylvania study 1n 1928. The obvious conclusion 1s that 
students learn what they study, and the more they study the more they 
learn. Students who had the most credit hours 1n the natural sciences had 
the highest test scores on the natural sciences test Items. The same 1s 
true for credits and scores 1n language, literature and fine arts, and also 
for credits and scores 1n social studies. 
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Credit Hours and Test Scores: 4500 Seniors from 
45 Colleges 1n Pennsylvania, Tested 1n 1928 



Natural Sciences Credits 
High: 55 or more 
Statewide average: 37 
Low: 6 or fewer 

Language, Literature, and Fine 
Arts Credits 

High: 67 or more 

Statewide average: 42 

Low: 12 or fewer 



Social Studies Credit Hours 
High: 97 or more 
Statewide average: 52 
Low: 12 or fewer 



Natural Sciences Test Scores 
120 
78 
46 



Language, Literature or 
Fine Arts Test Scores 

250 

168 

111 



Social Studies Test Scores 
292 
241 
196 
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The second example, some forty years later, comes from the Area tests 
of ETS's Undergraduate Assessment Program* The test results are based on 
47,000 seniors from 211 colleges 1n the years 1969-1971. For each of the 
three Area tests — Humanities, Natural Sciences, and Social Sciences — 
the average score for all seniors 1s compared with the average score of 
seniors whose "area of Interest" corresponds to the subject matter of the 
test. The scores are standardized scores 1n which the standard deviation 
1s 100 points. In the humanities area the differences between the two 
groups 1s 55 points. In the natural sciences area the differences are 57 
points and 66 points. In th* social sciences area the difference 1s 2 
points. The sub-group 1s also part of the total group; and since 60S of 
the total group of seniors are also 1n the social science Interest group, 
the difference 1s necessarily small. 
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UAP Area Tests: Approximately 47,000 Seniors from 
211 Colleges 1n the Years 1969-1971 



Humanities Scores 

All seniors 470 

Seniors whose area of 
Interest 1s 1n humanities 
(21X of all seniors) 525 



Natural Sciences 
Test Scores 

All seniors 480 

Seniors whose area of 
Interest 1s 1n biological 
sciences (12X of all seniors) 537 

Seniors whose area of 
Interest 1s 1n physical 
sciences (7J of all seniors) 556 



Social Sciences 
Test Scores 



All seniors 446 

Seniors whose area of 
Interest 1s 1n social 
sciences (60S of all seniors) 448 
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The third example comes from the UAP tests 1n designated major fields 
rather than from the more general Area tests. These results are shown 1n 
relation to the number of courses students had taken 1n their major — 
*ewer than eight vs eight or more courses. It 1s unlikely that , 1n one's 
major field, one would have fewer than six courses and still qualify as a 
major. Host likely, the comparisons are between students who have had 6 or 
7 courses ^vs those who have had 8 to 12 courses. Again, the more one 
studies a subject the more one knows about it. 

Given these obvious conclusions from decades of achievement testing, 
one can surely use them as external validity 1n relation to self-reported 
achievement. The College Student Experiences questionnaire, 1n the section 
labeled Estimate of Gains, lists 21 objectives. Students are asked "to 
what extent do you feel you have gained or made progress 1n each ...?" 
They could check "very little", "some", "quite a bit", or "very much". Not 
all of the objectives are associated with a specific major field, or even 
with any course-related experience objectives such as "ability to 
fuctlon as a team member", "ability to learn on your own, pursue Ideas, and 
find Information you need". There are, however, eight goals that are 
related to the curriculum, and specifically to a major field within the 
curriculum, or to a specific type of subject-matter. These subject-matter 
goals Include Fine Arts, Literature, English (writing), Science, 
Technology, Computers, Quantitative thinking, a^d Philosophy/Cultures. If 
student self-reports are valid they should show the same results that test 
scores show ~ higher achievement (progress) by students whose major field 
1s similar to the objectives as compared with the average of all students 
— and this 1s exactly what the results do, 1n fact, very clearly show. 

ERJC 



37 



Scale Scores of Seniors on Major Field Tests of the Undergraduate 
Assessment-Program, 1969-1971, Related to Number of Courses Taken 

1n the Major Field 



Fewer than light or 

eight courses more courses Difference 

Sciences and Engineering Tests 

Biology 539 566 + 27 

Chemistry 510 539 + 29 

Engineering 506 528 ♦ 22 

Humanities Tests 

History 458 491 ♦ 23 

Literature 455 491 ♦ 36 

Philosophy 514 551 ♦ 37 

French 448 486 + 38 



Note: The number of students tested varies by major field, ranging 
from approximately 1,000 to 8,000. 
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The data presented here are from a composite of 13,650 undergraduates 
from 49 colleges and universities who responded to the CSEQ 1n the spring 
of 1983, 1984 or 1985. Only those colleges that had given the 
questionnaire to all four undergraduate classes are Included 1n these 
composite results. Note also that knowledge or progress is necessarily 
less among freshmen or sophmores who have not yet accumulate many credits 
1n what Is or will be their major field, than 1t would be ^ tg juniors and 
seniors wno, by definition, have accumulated a mucn larger number of 
credits 1n their chosen major field. For some, then, the "major" may 
reflect an "area of Interest" and for others 1t may be a course program 
nearly completed. 

In the 11st below, tje f^rst four goals are related to the subject 
matter of arts and humanities, and the second four goals are related to the 
sciences. Among students who Identified their major field as "Arts (art 
music, theater, etc.)", 9 reported substantial gain ("quite a b1*" plus 
"very much") toward the objective "developing an understanding and 
enjoyment of art, music and drama". This high percentage contrasts with 
29X among students 1n general. For the objective related to literature, 
74X of humanities majors reported substantial gain compared with 33X for 
students 1n general. With respect to writing clearly and effectively, 80X 
of the humanities majors reported substantial progress compared with 57X of 
students 1n general. The goal described as "becoming aware of different 
philosophies, cultures, and ways of life" 1s not so clearly tied to 
classroom subject matter in the sense tnat students' Interpersonal 
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experiences might well contribute significantly toward Its attainment; but 
presumably courses 1n philosophy, history, anthropology, etc. would also be 
Influential. The results show substantial progress reported by 701 of 
humanities majors, and 64X of social sciences majors, compared with 51X by 
students 1n general. 
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Comparisons of Self-Reported Gains 
with Known Data About Achievement 



Gains 1n developing an understanding and enjoyment of art, music, and drama 

ARTS majors reporting substantial gains 92X 
average of all students 29% 

Gains 1n broadening your acquaintance and enjoyment of literature 
HUMANITIES majors reporting substantial gains 741 
average of all students 381 

Gains 1n writing clearly and effectively 

HUMANITIES majors reporting substantial glans 80S 
average of all students 571 

Gains 1n becoming aware of different philosophies and cultures 

HUMANITIES majors reporting substantial gains 70S 

SOCIAL SCIENCE majors reporting substantial gains 641 

averge of all students 51 J 

Gains 1n understanding the nature of science and experimentation 

BIOLOGICAL SCIENCES majors reporting substantial gains 851 
PHYSICAL SCIENCES majors reporting substantial gains 762 
average of all students 36% 
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Gains 1n understanding new scientific and technical developments 

BIOLOGICAL SCIENCES majors reporting substantial gains 74X 

PHYSICAL SCIENCES majors reporting substantial gains 66X 

ENGINEERING majors reporting substantial gains 66X 

average of all students 311 

Gains 1n acquiring familiarity with the use of computers 

COMPUTER SCIENCE majors reporting substantial gains gox 

ENGINEERING majors reporting substantial gains 65X 

average of all students 32X 

Gains 1n quantitative thinking — understanding probabilities, proportions, 
etc. 

PHYSICAL SCIENCES majors reporting substantial gains 68X 

ENGINEERING majors reporting substantial gains 68X 

average of all students 47X 
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In "understanding the nature of science and experimentation", 
substantial progress 1s claimed by 85X of biological sciences majors and 
761 of physical sciences majors, compared with 36Z for students 1n 
general. A similar result 1s shown for "understanding new scientific and 
technical developments", with percentages of 74% and 661 for scientific and 
technical majors, compared with 311 for the average of all students* The 
contrasting percentages for the goal "acquiring familiarity with the use of 
computers" are even sharper — 90X of majors 1n computer science Indicating 
substantial progress compared with 321 for the average of all students. 
With respect to quantitative thinking, students majoring 1n fields where 
much quantitative thinking 1s required — engineering, and physical 
sciences — are most likely to claim substantial progress (681) compared 
with 47X among students 1n general. 

All of the above results document the external validity of students 
self-reports. When asked to rate their progress toward goals that are 
obviously related to the subject matter of college courses, the ratings are 
totally congruent with what we know from achievement test scores and from 
t <e relationship between credit hours or amount of study and measued 
achievement. 

One does not know the actual level of measured achievement 
(standardized test scores) that 1s associated with the students 1 
self-estimate of gain. No doubt some students who rate their own progress 
as "quite a bit" may have higher achievement test scores than students at 
another college who rate their progress as "very much". Such discrepancies 
probably reflect Institutional differences 1n academic selectivity and 
academic demands. The same variability applies to credit hours vs test 



43 



scores. While 1t 1s true that the more courses one takes 1n a subject the 
more one 1s likely to know about it, 1t 1s also true that some students who 
have taken 5 or 6 courses may get higher test scores than some students who 
have taken 9 or 10 courses. But the averages are consistent. Sorting 
students according to course work (major field) or according to achievement 
test scores (major field) or according to self-reported progress (1n major 
fields) all produce the same conclusions. 



Internal Reliability: Consistency 1n responses to similar Items 



In the Science/Technology activity scale there are three activities 
that clearly Involve conversation about science. These items, Ugether 
with the percent of students who said they engaged 1n them frequently, are 

shot >w: 



Science activities 



Tested your understanding of 
some scientific principle by 
seeing 1f you could explain 
1t to another student. 

Showed a classmate how to use 
a piece of scientific 
equipment 

Attempted to explain an 
experimental procedure to 
student 

Conversation topic 

Science — theories, experi- 
ments, methods 



% Frequently among 
B1o.Sc1 . Phys.Sd . Engr . 
majors majors majors 



70 



69 



69 



43 



43 



35 



42 



34 



41 



57 



53 



58 



Average 
of all 
students 



34 



18 



15 



21 
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The conversation Item appears 1n a different part of the 
questionnaire. Presumably, the percent of students who say they have 
frequently talked about science with other students should have some 
similarity to the percent who said they had tried to explain a principle, 
procedure, and the use of equipment to another student. The responses 
were, 1n fact, very similar. 

A similar comparison can be made 1n the arts. In the activity scale 
labeled Art, Music, Theater there are three "talk about" Items, and later, 
among the conversation topics there 1s a topic described as "Fine arts - 
painting, theatrical productions, ballet, symphony, etc.". Here are the 
resul ts. 



Art, Music, Theater 
activities 



X Frequently among 
Arts majors 



Talked about art (painting, 
sculpture, architecture, 
artists, etc.) with other 
students at the college 

Talked about music (classical, 
popular, musicians, etc.) with 
other students at the college 

Talked about the theater (plays, 
musicals, dance, etc.) with 
other students at the college 

Conversation topic 

Fine arts — painting, theatrical 
productions, ballet, symphony, etc. 



68 



73 



58 



78 



Among 

all 
students 



17 



35 



20 



17 



Similar but not Identical questions produce similar but not Identical 
answers. The general congruence shown above can be regarded as an 
Indication of Internal reliability. 
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Internal Validity: finding plausible connections 

For the attainment of many goals of higher education there 1s no 
readily available objective documentation and 1n some cases no external 
evidence at all. One can use tests and credits when the goals are related 
to the curriculum or to particular courses and major fields. But what does 
one use for an external criterion when tne goals are self -understanding, 
understanding others, good health habits, functioning as a team member, 
etc.? 

In this part of the report several examples of internal consistencies 
that should be found are used to make the case for the credibility of 
self-reports. The first example 1s surely a connection that should exist. 
The activities ~ setting performance goals, following a regular exercise 
schedule, and keeping a record of progress — are, to a considerable 
extent, behavioral indicators of what is Involved in "developing good 
health habits and physical fitness". The tabulations show that students 
who report "very much" progress toward this goal are much more likely to 
set goals, follow a schedule, and keep a record than students whose 
self-rated progress is lower. 

Similar t* K * "*dons are shown for several other goals. In every case, 
the behavior that surely should contribute to students 1 estimated progress 
is clearly related to that progress. The differences in percents between 
"ve^y much" and "very little" are uniformly large, the one being from 2 to 
more than 6 times larger than the other. 

If student responses to the gains items or to the activity Items were 
capricious or unreliable or Invalid, the congruent and plausible 
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connections shown In the tables below would not occur. If what should be 
true Is also true empirically, the credibility of self-reports Is further 
documented. 
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Goal: Developing good health habits and physical fitness 



Percent engaging 1n activity 
frequently among students who 
rate their progress as: 



Activity 


Very 
Much 


Quite 
a bit 


Some 


Very 
Little 


Average of 
all students 


Set goals for your performance 
1n some skill (athletic) 


77 


58 


36 


23 


45 


Followed a regular schedule of 
exercise, or practice 1n some 
sport, on campus 


71 


53 


23 


14 


38 


Kept a chart or record of your 
progress 1n some skill or 
athletic activity. 


28 


15 


6 


3 


11 



Goal: Ability to function as a team member 



Percent engaging 1n activity 
frequently among students who 
rate their progress as: 



Activity 


Very 
Much 


Quite 
a bit 


Some 


Very 
Little 


Average of 
all students 


Used outdoor recreational spaces 
for casual and Informal group 
sports 


40 


27 


15 


7 


23 


Used facilities 1n the gym for 
playing sports that require 
more than one person 


42 


30 


18 


10 


26 


Played on an Intramural tears 


36 


26 


15 


7 


22 
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Goal: Understanding yourself — your abilities, interests, and personality 



Percent engaging 1n activity 
frequently among students who 
rate their progress as: 



Activity 


Very 
Much 


Quite 
a bit 


Some 


Very 
Little 


Average of 
all students 


Read articles or books about 
personal adjustment and 
personality development 


38 


25 


20 


15 


28 


Asked a friend to tell you 
what he/she really thought 
about you 


33 


21 


14 


12 


23 


Identified with a character 1n 
a book or movie and wondered 
what you might have done under 
similar circumstances 


56 


44 


36 


32 


46 



Goal: Understanding other people and the ab11t1y to get along with 
different kinds of people 



Percent engaging 1n activity 
frequently among students who 
rate their progress as: 



Activity 


Very 
Much 


Quite 
a bit 


Some 


Very 
Little 


Average of 
all students 


Made friends with students 
whose Interests were very 
different from yours 


73 


57 


38 


32 


59 


Made friends with students whose 
family background (economic 
and social) was very different 
from yours 


78 


63 


44 


36 


63 


Had serious discussions with 
students whose political 
opinions were very different 
from yours 


45 


33 


26 


22 


35 
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Goal: Becoming aware of different philosophies, cultues, and ways of life 



Percent engaging 1n activity 
frequently among students who 
rate their progress as: 



Activity 


Very 
Much 


Quite 
a bit 


Some 


Very 
Little 


Average of 
all students 


Made friends with students 
whose race was different 
from yours 


62 


50 


40 


33 


46 


Made friends with students 
from another country 


50 


24 


24 


20 


31 


Had serious discussions with 
students whose ohllosophy 
of life or personal values 
were very different from yours 


64 


48 


33 


25 


43 


Had serious discussions with 
students whose religious 
beliefs were very dlffrent 
from yours 


55 


40 


28 


22 


36 


Had serious discussions with 
students from a country 
different from yours 


42 


25 


15 


13 


23 
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Summation 

Claims for the credibility of student self-reports can be supported 

by: 

1. Evidence of test-retest consistency. 

2. Congruence with externally known facts, when such facts are 
available. 

3. Similar answers to questionnaire Items, when questions are 
asked 1n more than one way. 

4. Congruent, or expected, connections between Items that 
presumably should have connected responses — as for example between 
behavior and progress. 

One final note may be Important. Some psychometrldans and survey 
research analysts point out that the context within which questions are 
asked may Influence the response. In the College Student Experiences 
questionnaire, some people might claim that the answer to the Estimates of 
Gains items might be "contaminated" by all the preceding Items. The gains 
might be reported differently 1f they were asked separately, or without the 
prior context in the questionnaire. There 1s, however, a very different 
way of regarding this matter. If the gains Items were presented alone, 
without any context, the responses would be all the more Influenced by 
personal Idiosyncrasies, and hence all the more likely to produce random 
variations. By putting the gains Items at the end of the questionnaire, 
one Increases the credibility of answers. Everyone comes to these Items 
with the same background, having recalled one's behavior during the year, 
having characterized the college environment, having reported how much one 
has studied, what grades one has received, etc. so that, for everyone, the 
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estimate of gains becomes a more or less commonly based and thoughtful 
summary of the college experience, and therefore has a greater reliability. 
Finally, as a capstone Illustration of what can be done to assess the 
reliability of self-reports, we apply some multivariate statistical 
procedures which bear upon the predictive and construct validity of certain 
parts of the College Student Experiences questionnaire. 

Multivariate statistical procedures 

In this part of the report we describe the use of common multivariate 
statistical procedures to assess the validity of self report. The goal 1s 
to demonstrate that for surveys that allow Internal validity checks, one 
can go beyond 1tem-by-1tem validity to assessing the validity of self 
report at the construct level. These techniques are applied to a sample of 
6,000 undergraduates who provided responses to the College Student 
Experiences Questionnaire (CSEQ). 

Three techniques were applied to two types of scales and one 
background variable of the CSEQ. The background variable 1s academic major 
coded as: 1) Arts; 2) Biological Sciences; 3) Business; 4) Computer 
Science; 5) Education; 6) Engineering; 7) Health related fields; 8) 
Humanities; 9) Physical Sciences; 10) Social Sciences. The two types of 
scales are composed of 13 subscales from the Quality of Effort (QE) 
measures, and 21 Items from the Estimate of Gains (EG) measures. 

The first statistical procedure 1s discriminant analysis with special 
attention paid to the classification phase of the analysis. The 
discriminating variables are the EG Items while the classification variable 
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1s academic major. Since the number of undergraduates 1n each major are 
not the same, special a priori weighting 1s g1\.n to the samples during the 
classification phase. The rationale for using discriminant analysis 1n 
this contort 1s that those who major 1n certain academic disciplines 
protobly make th^. rost gains in those areas related to that discipline. 
He: * if one knowj a student's set of responses to the gams Items, one 
shou.1 b? abie to predict that individual's major. To the extent that this 
1s true, 1t might be argued thac the EG measures provide valid self report 
measures of g&lns. 

The second procedure 1s canonical correlation analysis applied to the 
Ql subset Jes and the EG Items. This procedure attempts to find a set of 
linear combinations (canonical varlates) within a scale that are maximally 
correlated with linear combinations formed from the other scale. To the 
extent that ttiese canonical varlates are Interpretable, we would expect 
high canonical correlations among those varlates from each set that have 
something 1n common. Often 1t 1s the case that canonical analysis obscures 
the simple factor structure that might e<1st within a set of Items. To 
address this problem, the third procedure 1s to factor analyze the QE 
subscaies and EG Items separately, rotate the factors for maximum 
1nterpretab1lty, calculate factor scores, and correlate facto- scores using 
simple Pearson correlations. It 1s expected that Pearson correlations 
should show high correlations among those factors that are substantively 
related. 

P1scr1m1nant and classification analysis were performed using ten 
academic majors and twenty EG Items. The Li Items were chosen to 
correspond as closely as possible to the academic majors, hence the Item 
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related to gains in vocational training was omitted since no major was 
uniquely vocational. 

The results of the classification phase are displayed 1n the following 
table. The table shows the percentage of those who were classified Into 
their known majors on the basis of the discriminant analysis. The 
dlagnonal represents the percentage of correct classifications, while the 
off-diagonal represents the m1sclass1f1 cations. T t can be seen that the EG 
responses tend to do well 1n predicting academic major. For example, 61X 
of all art majors were correctly classified as being art majors on the 
basis of the discriminant analysis. Certain Incorrect classifications did 
occur; but the m1sclass1f1cat1ons were 1n a sensible direction. For 
example, physical science majors (Including chemistry and math) were more 
often classified as biological science majors (Including biochemistry) or 
engineering majors. Overall, these results lend support to the claim that 
self report of gains as .measured by the EG data are valid 1n that they 
adequately predict a relatively objective measure of academic field where 
most gains should occur. 

The results of the canonical an? lysis are displayed 1n the next 
table. Here, only the first two canonical varlates extracted from each 
set of measures are presented. Note that the standardized canonical 
coefficients can be loosely Interpreted as factor loadings. 

Inspection of the standard -jd canonical coefficients for the QE 
subscales suggests that the first canonical varlate 1s dominated by the QE 
subscale related to Science and Technology. The first canonical varlate to 
uie EG data appears to be dominated by those Items related to computer 
knowlege and Science/Technology. The squa. £ canonical correlation between 
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Classification Analysis of Academic Major 
on Basis of Discriminant Analysis* 



Predicted Major 



True Major 


Art 


B1o 


Bu 


C/S 


Ed 


Engr 


Heal 


Hum 


PhyS 


SocS 


Total 


Arts 


61 


4 


11 


1 


0 


0 


0 


11 


0 


11 


100 


Bio 


1 


52 


7 


1 


0 


15 


13 


0 


0 


10 


100 


Bus 


2 


1 


73 


4 


2 


3 


2 


1 


0 


12 


100 


CompScI 


1 


1 


33 


45 


0 


9 


0 


1 


0 


1 


100 


Educ 


9 


2 


34 


2 


15 


3 


9 


7 


0 


13 


100 


Engr 


1 


10 


14 


8 


0 


57 


4 


0 


1 


5 


100 


Health 


2 


20 


15 


0 


4 


4 


34 


1 


0 


19 


100 


Human 


8 


3 


11 


1 


3 


1 


3 


38 


0 


32 


100 


Phs/Sd 


0 


33 


17 


7 


1 


25 


5 


1 


3 


7 


100 


Soc/Sci 


4 


7 


28 


2 


4 


3 


4 


9 


0 


39 


100 


TOTAL I 


5 


11 


33 


6 


3 


11 


7 


6 


0 


17 


100 



* Entries are 1n percentages. 
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Standard Canonical Varlates for 
QE Scale and EG Scale Item 4 ; 



EQ Canonical 


Varlates 


EG Canonical 


Varlates 


EG2 


Subscales 


0E1 


qE2 


Items 




EGl 


Library 


.02 


.05 


Professional 




.03 


.10 








Sc1 or Scholarly 








Facul ty 


- .06 


.10 


General Education 




.02 - 


.02 


Course Work 


AC 

- .05 


.15 


Caree* Development 




.07 


.03 


Art, Music 


- .16 


.66 


Art, Music, Drama 


- 


.14 


.64 


Drama 














Student Union 


.00 - 


.05 


Literature 


• 


.07 


.16 


Re ere a t1 on 


- .00 - 


.05 


Writing 


- 


.17 - 


• 13 


L 1 UDS 


.00 


.04 


computers 




.51 


n a 

• 09 


Writing 


- .14 - 


.03 


Philosophies/ 


• 


.08 


.09 








Piil fur a 










- .14 - 


.03 


Lxni ca i o uctnaaras 




.02 


no 


Experiences 














Acquaintances 


- .04 


.01 


Personality 


• 


.03 


.01 


Sc1/Tech. 


.94 


.21 


Understanding 




.05 


.08 








DaaaI a 
rcop Ic 








Pftnv Tonics 


.03 


.21 


To Jim Uav*^ 




.01 


OA 


i ii i ui ina l i un 


- .06 


.20 






.03 - 


. UO 








Science Experlm. 




.27 


• Ul 








Science/Technology 




.32 


.15 








Technology /Hazards 




.00 


.14 








Analytical Thinking 




.05 - 


.01 








Quantitative Thinking 


.16 - 


.15 








Similarities and 




.11 


.20 



Differences 
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these two canonical variates 1s statistically significant [ R2 * o.61, 
F (260, 44745) * 42.979, p < .000 ]. 

Inspection of the second canonical variates for botn sets of 
measures reveals a similar consistent picture. The second canonical 
varlate for the QE scale 1s related to -«rt, music, and theater, while the 
second canonical varlate of the EG responses 1s related to gains 1n 
understanding art, music, and drama. Again, the squared canonical 
correlation between this pair of variates 1s statistically significant 
[ R 2 » 0.38, F (228, 41681) * 29.011, p < .000]. Subsequent canonical 
variates were difficult to Interpret. 

It can be seen that the canonical analysis gives a useful, though 
limited, picture of the Internal val1d1t> of the two self report measures. 
Again, 1t should be noted that this procedure examined validity of self 
report at the construct level, where the canonical variates can be taken as 
representing the constructs, though perhaps not 1n the factor analytic 
sense. 

On the basis of previous research, four factors of the QE scale and 
five factors of the EG Items were Independently extracted and obliquely 
rotated to simple structure. The four factors of the QE scale were labeled 
1) Personal/Social; 2) Academic/ Intellectual ; 3' Clubs/0rgan1zat1ons; 
4) Science. The five EG factors were labeled 1) Personal/Social; 2) 
Science/Technology; 3) General Education; 4) Intellectual; 5) Vocational. 
A matrix of Pearson correlations among the factor scores obtained from the 
factor analysis 1s displayed 1n the next table. Although most of the 
correlations ars large and significant, those that are highest are among 
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Intercorrelatlons Among Factor Scores 



Quality of Effort Factors 
P/S A/I C/0 Sc1 



Estimated 
Gains 
Factors 



P/S 



S/T 



G/E 



Intel 



Voc 



.50 


.42 


.44 


.11 


.19 


.13 


.04 


.62 


.42 


.45 


.31 


.07 


.36 


.36 


.22 


.43 


.29 


.29 


.24 


.25 
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those factors that have something 1n common. For example , the gains 1n 
personal and social development factor 1s most highly correlated with QE 
factor measuring personal and social aspects such as student acquaintances, 
personal experiences, and topics of conversation. 

Two points can be made with regard to the above analyses. First, the 
application of multivariate statistical procedure* for assessing broad 
construct validity of self report has potential. It should be pointed out 
however, that construct valldty 1n the factor analytic sense was only 
explained via the factor score correlations. Secondly, with respect to the 
CSEQ, and the QE scales and EG Items 1n particular, evidence does exist for 
claiming a certain degree of validity in these self report measures. The 
result of all three analyses present a picture of a questionalre that 1s 
consistent with respect to self report predictive validity and self report 
construct validity. 
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CONCLUDING COMMENTS 

This report 1s obviously not a definitive document about the 
credibility of questionnaire survey responses by college students. It has 
aimed, nevertheless, to show that there are many ways to confirm the 
accuracy, reliability, and validity of student self-reports. It has also 
noted, from examples 1n higher education, and from examples 1n the larger 
area of public opinion polls and other general surveys, some of the common 
sources of measurement errors and errors of substance. 

In academic surveys the high proportion of students who do not reply 
to the questionnaires they have received 1s a most serious problem. One 
wonders whether rigorous follow-up messages would make a big difference, or 
whether the magnitude of the non-respondent problem reflects a deeper 
rejection of such Inquiries. Twenty years ago one could expect about two 
thirds of college students to respond to a questionnaire. Today, one 1s 
grateful 1f 50X respond. Times change. Nearly 50 years ago, In a study I 
directed of former university students, Including some who had graduated 
and some who had not, we got returns from 70S of those who received the 
questionnaire. The questionnaire was 52 pages long and took about two 
hours to answer. But that was before the Invention of television! (Pace, 
They Went to College . University of Mlnesota Press, 1941). 

My own belief 1s that the likelihood of good returns 1s enhanced by 
the recipients 1 opinions about the Importance of the topic, Its percel ed 
relevance to one's experience, one's regard for the source of the Inquiry 
and the likely use or value of the results, tha clarity of questions and 
the ease or confidence one has 1n answering them, and the overall 
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attractiveness of the design, format, typography, etc. of the Instrument. 
I also believe that unless these conditions are reasonably well met, even 
vigorous follow-up efforts will have little Influence on the response rate, 
and even when some Increase 1n response rate 1s achieved I would be 
skeptical about the Integrity of those added responses. 

Perhaps the second most common weakness 1n questionnaires by academic 
organizations 1s the Inclusion of questions that are quite likely to have 
unreliable or Invalid answers. These rcay be questions about vague 
concepts, questions about topics that students have not previously thouqht 
about, questions about values or life goals or future plans. Similar 
weaknesses are evident 1n public opinion polls that ask for opinions about 
ambiguous or undefined concepts such as national defense, foreign aid, 
national health, etc. The unfortunate consequence 1s that pollsters and 
public alike think that the results reflect public attitudes toward the 
matter, when 1n fact the topic 1s complex, can be phrased 1n a variety cf 
proper ways, and all one has done 1s to tally answers to the particular 
question which 1s not well or uniformly Interpreted 1n the first place. 
Questions about future expectations can be very clear — for example, "Do 
you expect to have any (more) children?" But 1t 1s difficult to know just 
what 1s being measured or revealed by answers to questions that different 
people can Interpret 1n different ways. 

A final Issue is the use of single questions versus the use of scales 
or combinations of questions that can be added together to produce a score 
or index. Commercial agencies rely on single items. Scale development 1s 
complex, time-consuming, and costly; and for public opinion polling 
agencies the presumed benefit is not worth the price. A scale is not 
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always better (more reliable and valid) than a single Item, In most 
academic surveys, however, the topics o* Inquiry tend to be rather global 
rather than narrowly explicit. In these cases there 1s merit 1n thinking 
about questionnaire construction in ways somewhat similar to thinking about 
test construction* 

Whatever the topic of Inquiry, 1t nay well be that one of the most 
Important elements to consider 1n writing the questions 1s the nature of 
judgment required to answer them* If the judgment or thought process 1s 
one of recall, 1s the thing or condition to be recalled clear and are the 
respondents able to recall accurately? If the judgment 1s one of 
comparison, is the base for the comparison clear and do the respondents 
have the experience or knowledge needed to make the comparison with 
reasonable confidence* If the judgment or thought process to answer the 
questions 1s one of generalizing or Inferring, do the respondents 
understand what 1s to be generalized? Many survey questions would probably 
yield better answers 1f the writers always asked themselves such questions 
as: Does the respondent have the knowledge or experience to give a useful 
answer? W111 different people Interpret the queslton 1n the same way? 
W111 the answer be accurate? What can I conclude or Interpret from the 
answers to this question? 

The quality of questionnaire answers (reliability, validity, 
credibility) depends most of all on the quality of the questions. 



